import pandas as pd
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
from dataprep.eda import create_report
import sweetviz as sv
from autoviz.AutoViz_Class import AutoViz_Class
general_directory = os.path.split(os.getcwd())[0]
data_location = os.path.join(general_directory, "data")
dataset = "trainingsetvalues" +".csv"
datasetlabels = "trainingsetlabels.csv"
df = pd.read_csv(os.path.join(data_location, dataset))
df1 = pd.read_csv(os.path.join(data_location, datasetlabels))
values_with_labels = pd.merge(left=df, right=df1, left_on = "id", right_on= "id")
Imported AutoViz_Class version: 0.0.84. Call using:
AV = AutoViz_Class()
AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=0,
lowess=False,chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30)
Note: verbose=0 or 1 generates charts and displays them in your local Jupyter notebook.
verbose=2 does not show plot but creates them and saves them in AutoViz_Plots directory in your local machine.
AV = AutoViz_Class()
df = AV.AutoViz(os.path.join(data_location, dataset))
Shape of your Data Set loaded: (59400, 40)
############## C L A S S I F Y I N G V A R I A B L E S ####################
Classifying variables in data set...
Number of Numeric Columns = 3
Number of Integer-Categorical Columns = 6
Number of String-Categorical Columns = 19
Number of Factor-Categorical Columns = 0
Number of String-Boolean Columns = 2
Number of Numeric-Boolean Columns = 0
Number of Discrete String Columns = 8
Number of NLP String Columns = 0
Number of Date Time Columns = 0
Number of ID Columns = 1
Number of Columns to Delete = 1
40 Predictors classified...
This does not include the Target column(s)
10 variables removed since they were ID or low-information variables
3 numeric variables in data exceeds limit, taking top 30 variables
Number of All Scatter Plots = 6
Time to run AutoViz (in seconds) = 18.656 ###################### VISUALIZATION Completed ########################